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Genome-wide analysis of epistasis in body mass 
index using multiple human populations 

Wen-Hua Wei*' 1 , Gib Hemani 2 , Attila Gyenesei 3 , Veronique Vitart 1 , Pau Navarro 1 , Caroline Hayward 1 , 
Claudia P Cabrera 1 , Jennifer E Huffman 1 , Sara A Knott 4 , Andrew A Hicks 5 , Igor Rudan 6 ' 7 , 
Peter P Pramstaller 5,8,9 , Sarah H Wild 6 , James F Wilson 6 , Harry Campbell 6 , Nicholas D Hastie 1 , 
Alan F Wright 1 and Chris S Haley 1,2 

We surveyed gene-gene interactions (epistasis) in human body mass index (BMI) in four European populations (n< 1200) via 
exhaustive pair-wise genome scans where interactions were computed as F ratios by testing a linear regression model fitting 
two single-nucleotide polymorphisms (SNPs) with interactions against the one without. Before the association tests, BMI was 
corrected for sex and age, normalised and adjusted for relatedness. Neither single SNPs nor SNP interactions were genome-wide 
significant in either cohort based on the consensus threshold (P=5.0E 08) and a Bonferroni corrected threshold (P=1.1E 12), 
respectively. Next we compared sub genome-wide significant SNP interactions (P<5.0E 08) across cohorts to identify common 
epistatic signals, where SNPs were annotated to genes to test for gene ontology (GO) enrichment. Among the epistatic genes 
contributing to the commonly enriched GO terms, 19 were shared across study cohorts of which 15 are previously published 
genome-wide association loci, including CDH13 (cadherin 13) associated with height and S0RCS2 (sorti I in-related VPS10 
domain containing receptor 2) associated with circulating insulin-like growth factor 1 and binding protein 3. Interactions 
between the 19 shared epistatic genes and those involving BMI candidate loci (P<5.0E 08) were tested across cohorts and 
found eight replicated at the SNP level (P<0.05) in at least one cohort, which were further tested and showed limited 
replication in a separate European population (n>5000). We conclude that genome-wide analysis of epistasis in multiple 
populations is an effective approach to provide new insights into the genetic regulation of BMI but requires additional efforts 
to confirm the findings. 
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INTRODUCTION 

Body mass index (BMI) is the most commonly used anthropometric 
method to define human obesity. BMI is a complex trait affected by 
many environmental (eg, diet, age, physical activity) and genetic 
factors, with heritability estimates that vary from 40-80% in twin 
studies, 20-50% in family studies and 20-60% in adoption studies. 1 
Recent genome-wide association (GWA) studies have successfully 
identified numerous single-nucleotide polymorphisms (SNPs) that 
are robustly associated with obesity related traits, including BMI. 2 ^ 
They shed light on the biological basis of obesity and suggest a role for 
neuronal influences on the regulation of appetite and/or energy 
balance. However, the identified genetic variants jointly explained 
only a small proportion of the trait variation and thus had limited 
predictive value for obesity risk. 5 For example, in a recent meta- 
analysis (249 796 individuals) 32 identified and replicated SNPs 
together explained only 1.45% of the inter-individual variation 
in BMI where the strongest SNP accounted for just 0.34% of the 
variance. 3 The 32 BMI SNPs map to 32 different genes that are 
referred to as BMI loci hereafter. 



Gene-gene interactions (epistasis) are thought to be potential 
sources of the unexplained genetic variation, 6-8 but they remain 
largely unexplored in the GWA studies conducted for BMI so far. 
A major hurdle for analysing epistasis in GWA studies was the lack of 
fast methods to enumerate billions of interaction tests in a full pair- 
wise genome scan to map different types of epistasis (eg, with or 
without main effects) while keeping false-positive rates under con- 
trol. 9 ' 10 Another hurdle for studying epistasis is the relatively small 
sample size in many existing GWA cohorts that may limit the power of 
detection and replication of epistasis signals unless the epistatic effects 
to be detected are large. 11 ' 12 It was showed in simulation that more 
than 4000 case-control pairs were needed to achieve 80% power of 
detection of epistasis with an odds ratio of 3.0 in complex diseases. 13 
For quantitative traits, sample sizes need to be substantially (eg, 45%) 
larger than case-control phenotypes to achieve a similar power. 14 

With the advances in computing technologies, the major hurdle is 
gradually easing and full pair-wise genome scans are beginning to 
be applied to GWA populations individually. 15 ' 16 Meta-analysis of 
epistasis as applied in GWA studies 3 could be a good way to overcome 
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the sample size hurdle but requires new methods to accommodate 
imputed SNP genotype data. Various approaches in search space 
reduction (ie, less stringent significance thresholds as result of the 
much reduced number of tests) can be applied to improve the power 
of detection of epistasis in individual GWA populations. 11 Testing 
interactions involving genome- wide significant loci (of marginal 
effects) with a threshold corrected for the actual number of tests has 
been suggested 10 ' 17,18 and applied successfully in recent studies. 16 ' 19-21 
Another approach is to select SNPs based on existing biological 
knowledge (eg, protein-protein interactions) and test interactions 
among them only. 22 ' 23 However, cautions should be taken when 
making the SNP selection 12 because biological knowledge may 
not be directly related to the trait studied and any biases in the 
pre-identified loci could lead to false-positive epistatic signals. 

Here we demonstrate a different approach to exploit the value of 
genome-wide analysis of epistasis using multiple populations. First we 
performed full pair-wise genome scans for epistasis in BMI in four 
GWA populations to which we had direct access: the Scottish 
ORCADES, 24 the CROATIA- Vis 25 and CROATIA-Korcula, 26 and the 
Italian MICROS 27 study cohorts. Each of these cohorts has a relatively 
small sample size and is sampled from distinct European regions with 
widely differing lifestyles and diets. Second, we identified common 
and potentially important gene-gene interactions using the epistasis 
signals uncovered in each cohort and their gene ontology (GO) 
enrichment across populations. In addition, we also identified a set 
of interactions involving the BMI loci (as prior knowledge) in different 
cohorts. Third, we tested the identified interactions in each cohort for 
replication and then the replicated signals in the Northern Finland 
Birth Cohort 1966 (NFBC1966) 28 We aim to address the question 
whether epistasis analysis is of value for the dissection of the genetic 
regulation of BMI in these study cohorts. 

MATERIALS AND METHODS 

Study cohorts and ethics statement 

The four study cohorts have been described in detail elsewhere. 24-27 ' 29 Briefly, 
the Scottish ORCADES cohort was recruited from a subgroup of 10 islands of 
the archipelago of Orkney. This study was approved by the NHS Orkney 
Research Ethics Committee and the North of Scotland REC. The CROATIA- Vis 
and CROATIA-Korcula cohorts were recruited from the island of Vis and the 
island of Korcula, respectively. Both studies were approved by the Ethical 
Committee of the Medical School, University of Zagreb and the Multi-Centre 
Research Ethics Committee for Scotland. The Italian MICROS cohort was 
recruited from villages in an isolated highland area of the South Tyrol. The 
study was approved by the ethical committee of the Autonomous Province of 
Bolzano. All participants gave written informed consent and were measured for 
a number of traits, including weight and height from which BMI values were 
calculated. 



DNA samples were genotyped with Illumina Infinium HumanHap300vl/v2 
(for CROATIA- Vis by the Wellcome Trust Clinical Facility in Einburgh, UK) or 
HumanCNV370vl SNP bead microarrays (for CROATIA-Korcula, ORCADES 
and MICROS by the Helmholtz Zentrum Munchen in Munich, Germany) and 
analysed using the BeadStudio software (Illumina). Quality control of the 
genotype data was performed for each cohort using the R/GenABEL package 
(Version 1.6-7) 30 based on a common set of criteria: individual call rate at 97%, 
SNP call rate at 95%, P-value for deviation from Hardy-Weinberg equilibrium 
at 1.0e-10, minor allele frequency at 2%. The NFBC1966 data were provided by 
the database of Genotype and Phenotype (dbGaP) via specific Data Use 
Certification and used as the replication cohort. NFBC1966 includes nearly 
all individuals born in 1966 in the two northernmost Finnish provinces that 
were genotyped with HumanCNV370vl SNP bead microarrays 28 and was put 
through the same quality control procedure as above. The summary informa- 
tion of each cohort after quality control and excluding individuals without BMI 
or age records or with extremely high BMI (ie, BMI>50kg/m 2 ) is given in 
Table 1. 

Statistical analysis 

The raw BMI data in each of the four study cohorts were corrected for age and 
sex and normalised using the rntmnsform function that is implemented in the 
GenABEL package performing quantile normalisation of residuals from a 
generalised linear model analysis. The normalised BMI residuals were then 
analysed using a linear mixed model to correct for polygenic effects due to 
relatedness using the polygenic function in the GenABEL package and the 
resultant environmental residuals (ie, pgresidualY in GenABEL) were used as 
the trait to test for association. 31 The polygenic heritability was estimated 
at the mixed-model step. Following the original GWA study, 28 in the 
NFBC1966 cohort individuals with pregnancy and/or self reported BMI 
measures were excluded, and the raw BMI values were corrected for the 
SexOCPG factor (recoded according to gender, status of taking oral contra- 
ception and pregnancy) and then normalised and corrected for relatedness 
as above. 

A single- SNP based GWA scan was performed in each population using a 
score test method (based on the additive model) implemented in the mmscore 
function in the GenABEL package. The consensus GWA threshold of 
7.3 (-logio(5.0E-08)) was applied to identify GWA significant SNPs. 32 We 
also performed a full pair-wise genome scan using the regression models 
described below. Considering a pair of SNPs denoted as SNPj and SNP 2 , the 
following genetic models were used to detect epistasis where genotypes of each 
SNP (ie, homozygote of the minor allele, homozygote of the major allele and 
heterozygote) were fitted as fixed factors: 

Model 1 : y = /i+SNPi+SNP 2 +SNPi * SNP 2 +e (two SNPs with interaction) 
Model 2 : y = /i+SNPi+SNP 2 +e (two SNPs without interaction) 

Model 3 : y=/J+e (NULL model) 

where y is the trait of interest, ji is the model constant, SNPi (or SNP 2 ) is a 
fixed factor with three levels (genotype classes), SNPi*SNP 2 is the interaction 
term, e is the random error term. The F ratio test of Model 1 against Model 3 
evaluates the whole pair effect, including interaction (ie, F paip 8 degrees of 



Table 1 Summary information of uncorrected BMI (kg/m 2 ) a 





Vis 


Korcula 


ORCADES 


MICROS 


NFBC1966 


N 


901 


880 


695 


1177 


5071 


Male/female ratio 


0.74 


0.56 


0.87 


0.78 


1.0 


Age range (years) 


18-91 


18-98 


18-92 


18-88 


31 


Number of SNPs 


300265 


307712 


309202 


293913 


323697 


BMI median 


27.10 


27.70 


27.08 


24.89 


24.00 


BMI mean±SD 


27.30±4.19 


27.93±4.05 


27.76 ±4.86 


25.51 ±4.57 


24.70 ±4.24 


BMI heritability 


0.356 


0.399 


0.514 


0.450 


0. 216 


GWA lambda 


1.001 


0.999 


0.997 


0.996 


1.005 



Vis and Korcula represent CROATIA-Vis and CROATIA-Korcula, respectively. 
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freedom). The F ratio test of Model 1 against Model 2 evaluates the interaction 
between the two SNPs (ie, Fj nt , 4 degrees of freedom). P-values were calculated 
based on the F distribution with relevant degrees of freedom and transformed 
to the -logio scale (ie, -log lrj Pp a ir for the F pair test, -logio-Pint for the F int test). 
We were concerned mainly about the F; n , tests in this study. 

Genome-wide significant thresholds (all in the — log 10 scale) were derived 
based on Bonferroni correction for multiple tests, that is, the 5% nominal P 
value corrected by the number of tests performed. Considering 300 000 SNPs, 
a full pair-wise genome scan perform 4.5E+10 association tests and thus the 
5% genome-wide threshold is 11.95 (ie, -log 10 (0.05/4.5E+10)). After each 
pair-wise genome scan, results were evaluated using the predefined threshold to 
identify genome-wide significant interaction signals. Each SNP in the results 
was annotated to the nearest gene within a window of 20 kilobases flanking the 
SNP (based on the physical distance to either the start or end of transcription of 
a gene; the distance is considered as zero if the SNP is within a gene). 

A GO enrichment analysis was conducted for each study cohort using 
the running mode of 'Two unranked lists of genes' in Gorilla 33 based on the 
standard Hyper Geometric statistics, where the annotated epistatic genes were 
used as the target with the full list of human genes as the background. For 
simplicity, we chose to use the same — logio P value as the consensus GWA 
threshold (ie, — log 10 P int >7.3) to select SNP pairs of each cohort and used 
their gene annotations as the input for the GO enrichment analysis. The GO 
terms enriched (P<1.0E— 03) were compared across study cohorts to identify 
firstly common GO terms and then their member genes shared by the cohorts. 
The shared epistatic genes were examined further for biological functions via 
literature mining and their associated interactions in the retained results of each 
cohort to identify potentially important interactions for replication tests. The 
BMI loci involved SNP pairs (— login Pi nt >7.3) in each study cohort were also 
identified as potentially important interaction signals for replication tests. 

Genome-wide significant SNP pairs and those identified as potentially 
important interactions were tested for replication across the four study cohorts. 
The replicated SNP pairs were further tested for replication in the NFBC1966 
cohort. Each replication test was done at both the SNP and region 
levels. At the SNP level, each replicated SNP was exactly the same as the 



corresponding epistatic SNP and thus the 5% nominal significance threshold 
(ie, — log 10 (0.05)=1.30) was used because only one replication test was needed. 
At the region level, interactions between each of 10 adjacent SNPs (ie, five 
upstream and five downstream) of the first epistatic SNP and each of those of 
the second were tested, to accommodate the situation where multiple SNPs 
may tag a same mutant of a gene. Permutation was used to derive significance 
thresholds for replication of each epistatic pair at the region level, where 
phenotypes were permuted and the highest — login Pint value of 121 (ie, 11x11) 
interaction tests was recorded in each of 1000 iterations. The replicated SNP 
pairs were fitted together into the full model to calculate the proportion of 
phenotypic variance explained in each study cohort. 

RESULTS 

The mean BMI was similar across the CROATIA- Vis, CROATIA- 
Korcula and ORCADES cohorts but lower in MICROS (Table 1). The 
polygenic heritability estimates varied from 0.356 (CROATIA- Vis) to 
0.514 (ORCADES). Conventional GWA scans found no genome- wide 
significant SNPs in any single cohort. The inflation factor lambda 
(computed by regression of observed association P-values against the 
expected) of each GWA scan was very close to 1 (Table 1), suggesting 
the family relatedness in each cohort was well accounted for. Only 
8 out of the 32 BMI SNPs previously identified 3 were genotyped in the 
four study cohorts and none of these showed a strong association with 
BMI (Supplementary Table SI). 

Full pair-wise genome scans found no SNP pairs that passed the 
genome-wide threshold (— log 10 Pi nt = 11.95) in any of the four study 
cohorts (Figure 1). Considering interaction signals with — login Pj n t 
> 7.3, MICROS had the least number of SNP pairs and consequently 
the least number of annotated genes, whereas the remaining three 
cohorts had relatively similar numbers of SNP pairs and annotated 
genes (Table 2). Five out of the 32 BMI loci (but not the BMI SNPs) 
were involved in 7 epistatic pairs in CROATIA- Vis: FTO, KCTD15, 
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Figure 1 Pair-wise epistatic signals in each study cohort, (a) Pairwise epistatic signals in CROATIA-Vis. (b) Pairwise epistatic signals in CROATIA-Korcula. 
(c) Pairwise epistatic signals in ORCADES. (d) Pairwise epistatic signals in MICROS. 
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LRP1B, NEGRI and PRKD1. Similarly, three BMI loci (NEGRI, 
NRXN3 and PRKD1) were involved in CROATIA-Korcula, two 
(pro and MTCH2) in ORCADES and two (ETO and LRP1B) in 
MICROS. 

GO terms enriched by epistatic genes (— log 10 P; nt >7.3) in each 
cohort were compared (Supplementary Table S2) and identified 9 
common in all four cohorts (Table 3) that might indicate common 
regulation mechanisms (eg, GO:0008038 - neuron recognition). 
Among the epistatic genes that enriched the 9 GO terms, we found 
19 epistatic genes shared by the four cohorts of which 15 are 
previously published GWA loci (mostly not genome-wide significant) 
associating with various phenotypes 34 (Supplementary Table S3). 
Most of the 19 shared epistatic genes interacted with one another 
despite their interactions being relatively weak (— log 10 Pi nt <7.3, 



Table 2 Number of SNP pairs and genes annotated at 
-log 10 /'int>7.3 in each study cohort 3 





SNP 


Gene 


SNP pair with 


BMI 


Cohort 


pair 


annotated 


BMI loci 


loci 


Vis 


1365 


1011 




7 


5 


Korcula 


1237 


971 




5 


3 


ORCADES 


1335 


890 




4 


2 


MICROS 


785 


639 




3 


2 


a Vis and Korcula represent CROATIA-Vis and CROATIA-Korcula, respectively. 




Table 3 Common GO terms enriched by epistatic genes in CROATIA- 


Vis (Vis), CROATIA-Korcula (Korcula), ORCADES and MICROS 3 


GO term 


Description 


Vis 


Korcula 


ORCADES 


MICROS 


G0:0008038 


Neuron recognition 


2.07E-06 


1.78E-05 


8.38E-07 


1.13E-04 


G0:0022610 


Biological adhesion 


2.49E-06 


3.16E-05 


3.72E-05 


3.59E-05 


G0:0007155 


Cell adhesion 


2.49E-06 


3.16E-05 


3.72E-05 


3.59E-05 


G0:0044459 


Plasma membrane part 


1.51E-09 


5.88E-08 


6.06E-13 


1.54E-05 


G0:0042995 


Cell projection 


1.16E-07 


6.42E-06 


1.63E-07 


1.81E-07 


G0:0043005 


Neuron projection 


1.23E-05 


7.05E-06 


2.56E-06 


1.59E-07 


G0:0044425 


Membrane part 


2.82E-05 


1.43E-05 


2.00E-11 


3.25E-06 


G0:0044456 


Synapse part 


4.15E-05 


6.97E-04 


7.50E-05 


4.07E-04 


G0:0005886 


Plasma membrane 


6.02E-04 


1.85E-06 


4.29E-07 


6.28E-06 



a The process, function and component GO terms are in the first, second and third panels, 
respectively. 



Supplementary Table S4) in general, including CDH13 (cadherin 13) 
associated with height 35 and SORCS2 (sortilin-related VPS10 domain 
containing receptor 2) associated with circulating insulin-like growth 
factor 1 and insulin-like growth factor-binding protein-3, which are 
important for anthropometric traits and risk of cancer and cardio- 
vascular disease. 36 

We further tested replication of the SNP pairs involving either BMI 
loci (19, Table 2) or two shared epistatic genes across the study cohorts 
(50, Supplementary Table S4). Despite none of the 69 SNP pairs 
being genome-wide significant, eight of them had a replication in 
one or more cohorts at the SNP level (ie, — login Pint > 1.30; Table 4). 
The best replicated pairs at the SNP level were rs2202167 (NRXN3) 
rslll50880 (-logi 0 P int was 8.19, 1.68 and 1.43 in CROATIA-Korcula, 
CROATIA- Vis and ORCADES, respectively) and rsl474056 (MTCH2) 
- rs7250947 [PLIN4) (-logi 0 P in t was 8.08 in ORCADES and 2.44 in 
CROATIA-Korcula). The rslll50880 SNP is near the RPH3AL gene, 
which is known to have a key role in insulin secretion by pancreatic 
cells. 37 The PLIN4 gene may be important for intracellular and neutral 
lipid storage droplets. 38 The eight replicated SNP pairs together 
explained the phenotypic variance of BMI by 4, 4, 2 and 0.5% in 
CROATIA- Vis, CROATIA-Korcula, ORCADES and MICROS, respec- 
tively. By testing replication at the region level, we found the pair of 
rs9858278 (NAALADL2) - rs7198915 (CDH13) replicated in CROA- 
TIA- Vis, CROATIA-Korcula and MICROS (exceed the 5% thresholds, 
Table 4 and Supplementary Table S5). Further testing the nine 
replicated SNP pairs in the NFBC1966 cohort found none replicated 
at either the SNP or region levels. However, seven out of the nine 
pairs had — logi 0 Pi nt >2, of which three exceed the 20% thresholds 
(Table 4 and Supplementary Table S5). 

DISCUSSION 

Gene-gene interactions have been suggested as sources of the hidden 
genetic variations in GWA studies, 6 ' 7 but the extent of their role in this 
regard has yet to be demonstrated. One big challenge is that the 
sample sizes of many GWA data sets are relatively small (eg, less than 
4000 individuals) and hence the power to detect epistasis could be 
low. 8,13 Therefore studying epistasis in a single GWA population is 
unlikely to be fruitful. This is certainly true in our case where 
exhaustive genome scans in the four study cohorts found no 
genome-wide significant epistasis associated with BMI. We suggest 
to tackle the challenge by looking for common (thus potentially 
important) gene-gene interactions from sub genome-wide signifi- 



Table 4 Replicated interactions involving either the BMI loci or two shared epistatic genes across cohorts 3 



SNP! 


Chri 


Genei 


SNP 2 


Chr 2 


Gene 2 


Vis 


Korcula 


ORCADES 


MICROS 


NFBC1966 


rs953104 


1 


ATP1A4 


rs8008553 


14 


PRKDl b 


8.46 c (5.53 d ) 


0.27 (2.84) 


0.19 (1.59) 


1.34 c (2.13) 


0.12 (2.11) 


rsl0256700 


7 


CNTNAP2 


rsl0833498 


11 


NELL1 


0.03 (1.94) 


6.05 c (5.04 d ) 


0.02 (1.66) 


1.35 c (2.27) 


0.42 (2.68 f ) 


rs587791 


9 


PTPRD 


rsl408219 


13 


GPC6 


5.05 c (3.71 d ) 


0.45 (1.43) 


0.06 (2.31) 


1.41 c (2.60) 


0.38 (2.05) 


rs3847291 


9 


PTPRD 


rsl 554941 


21 


DSCAM 


2.21 c (2.91 e ) 


0.59 (2.07) 


5.03 c (3.01 e ) 


0.38 (1.46) 


0.77 (2.03) 


rsl474056 


11 


MTCH^ 


rs7250947 


19 


PLIN4 


0.19 (1.56) 


2.44 c (1.62) 


8.07 c (7.84 d ) 


0.36 (2.60) 


0.25 (1.24) 


rs3783297 


14 


PRKDl b 


rs7 146371 


14 


(-) 


0.85 (1.86) 


7.52 c (6.77 d ) 


0.48 (2.13) 


1.67 c (1.76) 


0.07 (2.90 f ) 


rs2202167 


14 


NRXN^ 


rsl 1150880 


17 


(-) 


1.68 c (2.51) 


8.19 c (5.98 d ) 


1.43 c (2.96) 


0.29 (1.54) 


0.10 (2.25) 


rsl 1071868 


15 


MEGF11 


rs2665272 


16 


FTCP 


7.97 c (3.81 d ) 


1.38 c (2.30) 


0.31 (1.73) 


0.03 (2.39) 


0.09 (2.80 f ) 


rs9858278 


3 


NAALADL2 


rs7198915 


16 


CDH13 


4.24 c (3.48 d ) 


0.01 (3.17 d ) 


0.43 (2.08) 


0.93 (3.16 d ) 


0.02 (1.20) 



a SNPi (SNP 2 ): the first (second) SNP name; chri (chr 2 ): the chromosome where SNPi (SNP 2 ) locates; genei (gene 2 ): symbol of the gene annotated by SNPi (SNP 2 ); — logioPint value at the SNP 

(region) level; Vis: CROATIA-Vis; Korcula: CROATIA-Korcula; (-): no gene annotated; 5%, 10% and 20% thresholds derived for replication at the region level in Supplementary Table S5. 

b One of the BMI loci. 

c -logi 0 Pi„t>1.30 at the SNP level. 

Significant at the 5% threshold. 

Significant at the 10% threshold. 

'Significant at the 20% threshold. 
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cant epistatic signals (— log 1() P; nt > 7.3) in multiple GWA populations. 
We showed that GO enrichment analysis could be used to identify 
common GO terms (ie gene function groups) enriched by the epistatic 
signals in the four study cohorts from which 19 shared epistatic genes 
were identified. Most of the 19 shared epistatic genes are previously 
identified GWA loci associating with phenotypes other than BMI and 
interacted with one another. Their interactions were considered 
potentially important because they belong to one or multiple com- 
monly enriched GO terms. Interactions involving at least one of the 
32 BMI loci with — log 10 Pi nt > 7.3 were also considered potentially 
important assuming the BMI loci are likely interactive. 

Being aware of possible noises in those potentially important 
interactions, we used replication to identify the most reliable epistatic 
signals across the study cohorts. Eight epistatic pairs involving either 
the BMI loci or two shared epistatic genes showed replication at the 
SNP level in at least one cohort (Table 4). The eight epistatic pairs 
together could indeed explain a considerable proportion of the BMI 
variation in each individual cohort. Nevertheless, caution is recom- 
mended in light of the potential overestimation of the effects due to 
the 'winner's curse'. 39 Besides, none of the eight epistatic pairs were 
replicated in all of the four study cohorts, or in the replication cohort 
NFBC1966. Further replication tests in other populations and/or 
functional assays are useful to confirm whether they are true signals. 

Statistical replication has been used as the golden rule to prevent 
reporting false positives in GWA studies. This however appears to be 
far more difficult for epistatic signals than for single SNP signals for 
reasons, including power, minor allele frequency change, and linkage 
disequilibrium between epistatic SNP and mutant for both loci. 16 The 
moderate — log 10 P; n t values of the epistatic pairs tested for replication 
suggest that the linkage disequilibrium between epistatic SNPs and 
mutants is not high so replication of these pairs will be difficult. 
Furthermore, different environments may cause different phenotype 
distributions in the discovery and replication cohorts. The lack of 
replication in the NFBC1966 cohort could be due to two important 
environmental factors of BMI: age 40 (ie, 31 vs a range between 18 and 
90 in the study cohorts) and diets. 29 

The approach based on common gene-gene interactions in multi- 
ple GWA populations is an effective solution to the issue of limited 
power of detection of epistasis. It is just a partial solution though 
because some ignored interactions may be important as well. Com- 
parison of sub genome-wide significant epistatic signals across multi- 
ple populations can be made at either the SNP, or gene or pathway 
level and seem more fruitful at the gene or pathway level than the SNP 
level. The approach may become more useful if better annotation 
methods (considering only GWA signals without interactions) 41 can 
be adapted to epistasis. For example, not all epistatic SNPs 
were annotated to genes in the study and hence did not contribute 
to the enrichment analysis. The approach will likely remain important 
even once new tools for meta-analysis of epistasis in GWA data sets 
become available to increase power for detection of epistasis. 
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